Installing and Configuring Stan in R

Author

A. Jordan Nafa

Published

August 30, 2022

Getting Started with Stan

Much of this course is taught from a primarily Bayesian perspective which provides a principled and intuitive framework for quantitative analysis and probabilistic reasoning. For much of what we cover in terms of application, we will be using the R package {brms} which provides a user-friendly and computationally efficient interface to Stan’s implementation of the No-U-Turn Sampler, a Hamiltonian Markov Chain Monte Carlo algorithm (Bürkner 2017, 2018; Carpenter et al. 2017; Hoffman and Gelman 2014). After you have installed R, Rtools/Xcode, and RStudio as detailed on the “Getting Started with R” page, this guide will walk you through the process of installing Stan, {brms}, and the necessary dependencies. You can download all of the code shown in this document in the form of a script or copy and paste the code shown here into your RStudio session by clicking in the top right corner of each code block.

Preliminaries

To begin, I recommend setting some global options for the R session as shown below and setting the MAKEFLAGS system variable to enable multi-core compilation which will help speed up installation time. Note, however, that if you restart your R session this will reset the global options to their defaults so you may need to run this particular code block more than once during the installation process.

# Set Session Options
options(
  digits = 6, # Significant figures output
  scipen = 999, # Disable scientific notation
  repos = getOption("repos")["CRAN"] # Install packages from CRAN
)

# Set the makeflags to use multiple cores for faster compilation
Sys.setenv(
  MAKEFLAGS = paste0(
    "-j", 
    parallel::detectCores(logical = FALSE)
    ))

After setting the global options for the session, we will check if any existing Stan packages are installed (unlikely, but it is good to be safe) and if so, remove them. After this is done, the next block checks that required packages for subsequent steps in the installation process are installed and installs them if they are note already present. In passing, notice how the code below is wrapped in a pair of curly braces. In R, this means that any code inside the {...} will be evaluated at the same time rather than line-by-line and is useful for calling if, if else, and else statements outside of functions.

# Check if any existing Stan packages are installed
{
  ## Check for existing installations
  stan_packages <- installed.packages()[
    grepl("cmdstanr|rstan$|StanHeaders|brms$", 
          installed.packages()[, 1]), 1]
  
  ## Remove any existing Stan packages
  if (length(stan_packages) > 0) {
    remove.packages(c("StanHeaders", "rstan", "brms"))
  }
  
  ## Delete any pre-existing RData file
  if (file.exists(".RData")) {
    file.remove(".RData")
  }
}

# Check if packages necessary for later installation steps are installed
{
  ## Retrieve installed packages
  pkgs <- installed.packages()[, 1]
  
  ## Check if rstudioapi is installed
  if (isTRUE(all.equal(grep("rstudioapi", pkgs), integer(0)))) {
    print("Installing the {rstudioapi} package")
    install.packages("rstudioapi")
  }
  
  ## Check if remotes is installed
  if (isTRUE(all.equal(grep("remotes", pkgs), integer(0)))) {
    print("Installing the {remotes} package")
    install.packages("remotes")
  }
  
  ## Else print a message
  else {
    print("{remotes} and {rstudioapi} packages are already installed")
  }
}

If you are on a computer with a Windows operating system and you followed the instructions on the “Getting Started with R” page, it should not be necessary to manually configure the C++ toolchain. For OSX users, the above code should work as long as you are on a recent version of Catalina but if you run into errors during installation or subsequent compilation, you should consult the documentation for configuring the C++ toolchain on Macs and notify the instructor or teaching assistant of any issues as soon as possible so we can figure out how to get them resolved.

Installing rstan and brms

Once we have installed the necessary packages using the code in the previous section, we can install rstan, the main R interface to Stan, along with the required headers for the Stan math library. Since the StanHeaders package is a dependency of rstan, installing rstan using the code below will install both rstan and StanHeaders (Stan Development Team 2022a, 2022b).

# Install the development versions of rstan and StanHeaders
install.packages(
  pkgs = "rstan", 
  repos = c(
    "https://mc-stan.org/r-packages/", 
    getOption("repos")
    ))

To check the installation was successful and everything is working properly in the back-end, you can execute the following code in R. Once you have verified everything runs without any errors, this is a good time to restart your R session before proceeding to the next step.

# Fit a simple example model to check the Stan compiler is working
example(stan_model, package = "rstan", run.dontrun = TRUE)

# You can also manually restart via RStudio's GUI
rstudioapi::restartSession()

Next, we will proceed to installing the brms package. To get the most recent development version we will use the install_github function from the remotes package as shown below.

Note

If you run into issues installing brms from github, you can try installing the pre-compiled binaries using install.packages("brms") instead and see if that works.

# Install the latest development version of brms from github
remotes::install_github("paul-buerkner/brms")

If you are prompted to update existing R packages, type 1 in the console and press enter to proceed. An additional window may appear asking if you would like to compile more recent versions of some packages to be updated from source in which case you should choose “no” as doing so may cause the brms installation to fail. If the package installs without any errors you can proceed to the next step.

Installing cmdstanr and cmdstan

The brms package provides the option to allow you to use cmdstanr, a light-weight alternative to rstan, as a back-end instead of rstan (Gabry and Češnovar 2022). This makes it possible to use the latest version of the Stan math libraries and cmdstan. Since rstan development tends to lag behind Stan, this will often yield substantial performance gains by allowing you to utilize the latest updates to the Stan language and can have the added bonus of being more stable on certain operating systems.

First, we will start by installing the cmdstanr package from github using the same approach we used to install brms in the previous section.

# Install cmdstanr from github
remotes::install_github("stan-dev/cmdstanr")

Once we have successfully installed cmdstanr, we can use the check_cmdstan_toolchain function with the fix argument set to TRUE to check if the C++ toolchain needs to be configured further and if so, automatically apply the correct configuration.

# Check that the C++ Toolchain is Configured
cmdstanr::check_cmdstan_toolchain(fix = TRUE)
  The C++ toolchain required for CmdStan is setup properly!

After verifying the toolchain configuration is correct, we can run the following code to download and compile the latest release of cmdstan, which at the time of writing this tutorial is version 2.30.1.

# Install cmdstan version 2.30.1
cmdstanr::install_cmdstan(
  cores = parallel::detectCores(logical = FALSE),
  overwrite = TRUE,
  cpp_options = list("STAN_THREADS" = TRUE),
  check_toolchain = TRUE
)

If cmdstan compiles without any errors, you should be able to verify the installation and ensure the path directory has been correctly set by running the following code.

# Verify that cmdstan installed successfully
(cmdstan.version <- cmdstanr::cmdstan_version())
  [1] "2.30.1"
# Ensure cmdstan path is set properly
cmdstanr::set_cmdstan_path(
  path = paste(
    Sys.getenv("HOME"), 
    "/.cmdstan/cmdstan-", 
    cmdstan.version,
    sep = ""
    ))
  CmdStan path set to: E:/Users/Documents/.cmdstan/cmdstan-2.30.1

As the output shows, cmdstan has been successfully installed to the directory E:/Users/Documents/.cmdstan/cmdstan-2.30.1.

For those on a Windows operating system, the final step in the installation process is to set the path environment variable for the Intel TBB library which we can do by running the code shown below.

# Execute `mingw32-make install-tbb` in the terminal
rstudioapi::terminalExecute(
  command = "mingw32-make install-tbb",
  workingDir = cmdstanr::cmdstan_path()
  )

# Reset the terminal
rstudioapi::terminalKill(id = rstudioapi::terminalList())

Note that for this change to take effect, you will need to close and reopen RStudio after executing the terminal command before proceeding to the next section.

Congratulations, you have successfully installed Stan and no further steps are required for MacOS. You can proceed to the verification section below, though I suggest closing and reopening RStudio before doing so.

Verifying the Installation

Finally, to verify that the installation was successful and everything works correctly, we can fit a simple linear model using brms as shown below. For our purposes here, we will use the built-in mtcars data and model fuel efficiency (mpg) as a linear function of weight (wt) and logged horsepower (hp). Formally, we can express this model as

\[ \begin{align*} y_{i} &\sim \mathcal{N(\mu_{i}, \sigma^{2})}\\ \mu_{i} =& \alpha + \beta_{1} \times Weight + \beta_{2} \times Horsepower + \sigma \\ \text{with priors}\\ \alpha &\sim \textit{Student T}(10, ~20.09, ~2)\\ \beta_{1} &\sim Normal(0, ~9.24)\\ \beta_{2} &\sim Normal(0, ~12.68)\\ \sigma &\sim Exponential(1) \end{align*} \] Don’t worry if you do not understand the math expressed here–which I assume is the case for most if not all students–as this will all be made much clearer and explained in detail as we progress through the semester.

# Load the brms library
library(brms)

# Load the built-in mtcars data
data("mtcars")
## Take the log of horsepower
mtcars$log_hp <- log(mtcars$hp)
  
## Specify some weakly informative priors for the model parameters
mpg_priors <- prior(student_t(10, 20.09, 2), class = Intercept) +
  prior(normal(0, 9.24), class = b, coef = wt) +
  prior(normal(0, 12.68), class = b, coef = log_hp) +
  prior(exponential(1), class = sigma)

## Fit the model
bayes_mpg_fit <- brm(
  formula = mpg ~ wt + log_hp, # Formula describing the model
  family = gaussian(), # Linear regression
  prior = mpg_priors, # Priors on the parameters
  data = mtcars, # Data for the model
  cores = 4, # Number of cores to use for parallel chains
  chains = 4, # Number of chains, should be at least 4
  iter = 2000, # Total iterations = Warm-Up + Sampling
  warmup = 1000, # Warm-Up Iterations
  refresh = 0, # Disable printing progress
  save_pars = save_pars(all = TRUE),
  backend = "cmdstanr", # Requires cmdstanr and cmdstan be installed
  silent = 2 # Set to 0 or 1 to print compiler messages
)
  Running MCMC with 4 parallel chains...
  
  Chain 1 finished in 0.1 seconds.
  Chain 2 finished in 0.1 seconds.
  Chain 3 finished in 0.0 seconds.
  Chain 4 finished in 0.1 seconds.
  
  All 4 chains finished successfully.
  Mean chain execution time: 0.1 seconds.
  Total execution time: 0.3 seconds.

If everything was installed and configured successfully, the model should run in about 0.3 seconds and you can obtain a summary of the results using the summary function.

# Print a summary of the fitted model
summary(bayes_mpg_fit)
   Family: gaussian 
    Links: mu = identity; sigma = identity 
  Formula: mpg ~ wt + log_hp 
     Data: mtcars (Number of observations: 32) 
    Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
           total post-warmup draws = 4000
  
  Population-Level Effects: 
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
  Intercept    59.23      5.02    49.31    69.52 1.00     3045     2451
  wt           -3.31      0.62    -4.54    -2.09 1.00     2681     2078
  log_hp       -5.84      1.28    -8.39    -3.27 1.00     2608     2232
  
  Family Specific Parameters: 
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
  sigma     2.34      0.30     1.85     3.01 1.00     2696     2664
  
  Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
  and Tail_ESS are effective sample size measures, and Rhat is the potential
  scale reduction factor on split chains (at convergence, Rhat = 1).

Additional Resources for Getting Started with R and Stan

  • Bayes Rules! An Introduction to Applied Bayesian Modeling

    • An introductory text to applied Bayesian statistics designed to be accessible to undergraduates without a strong background in programming or advanced statistics.
  • Stan Discourse Forums

    • Official forums for the Stan programming language. Useful resource for seeking help with troubleshooting any installation issues and other questions related to Stan and its interfaces.

Session Information

  ─ Session info ───────────────────────────────────────────────────────────────
   setting  value
   version  R version 4.2.1 (2022-06-23 ucrt)
   os       Windows 10 x64 (build 19044)
   system   x86_64, mingw32
   ui       RTerm
   language (EN)
   collate  English_United States.utf8
   ctype    English_United States.utf8
   tz       America/Chicago
   date     2022-08-30
   pandoc   2.19 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
  
  ─ Packages ───────────────────────────────────────────────────────────────────
   ! package   * version date (UTC) lib source
     base      * 4.2.1   2022-06-23 [?] local
     brms      * 2.17.5  2022-08-11 [1] Github (paul-buerkner/brms@00c6f66)
   P datasets  * 4.2.1   2022-06-23 [2] local
   P graphics  * 4.2.1   2022-06-23 [2] local
   P grDevices * 4.2.1   2022-06-23 [2] local
   P methods   * 4.2.1   2022-06-23 [2] local
     Rcpp      * 1.0.9   2022-07-08 [1] CRAN (R 4.2.1)
   P stats     * 4.2.1   2022-06-23 [2] local
   P utils     * 4.2.1   2022-06-23 [2] local
  
   [1] C:/Users/ajn0093/AppData/Local/R/win-library/4.2
   [2] C:/Program Files/R/R-4.2.1/library
  
   P ── Loaded and on-disk path mismatch.
  
  ──────────────────────────────────────────────────────────────────────────────

References

Bürkner, Paul-Christian. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80: 1–28.
———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package Brms.” The R Journal 10: 395–411.
Carpenter, Bob et al. 2017. Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76.
Gabry, Jonah, and Rok Češnovar. 2022. Cmdstanr: R Interface to ’CmdStan’.
Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-u-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15: 1593–623.
Stan Development Team. 2022a. RStan: The R Interface to Stan.” https://mc-stan.org/.
———. 2022b. StanHeaders: Headers for the R Interface to Stan.” https://mc-stan.org/.